1,013 research outputs found

    A Likelihood Ratio Test of Speciation with Gene Flow Using Genomic Sequence Data

    Get PDF
    Genomic sequence data may be used to test hypotheses about the process of species formation. In this paper, I implement a likelihood ratio test of variable species divergence times over the genome, which may be considered a test of the null model of allopatric speciation without gene flow against the alternative model of parapatric speciation with gene flow. Two models are implemented in the likelihood framework, which accommodate coalescent events in the ancestral populations in a phylogeny of three species. One model assumes a constant species divergence time over the genome, whereas another allows it to vary. Computer simulation shows that the test has acceptable false positive rate but to achieve reasonable power, hundreds or even thousands of genomic loci may be necessary. The test is applied to genomic data from the human, chimpanzee, and gorilla

    Inference of gene flow between species under misspecified models

    Get PDF
    Genomic data are informative about the history of species divergence and interspecific gene flow, including the direction, timing, and strength of gene flow. Nevertheless, gene flow in opposite directions generates similar patterns in multilocus sequence data, such as reduced sequence divergence between the hybridizing species, and as a result, inference of the direction of gene flow is challenging. Here we study the amount of information about the direction of gene flow in data of multilocus sequence alignments, when the data are analyzed using likelihood-based methods under the multi-species coalescent with introgression (MSci) model. We analyze the case of two species, and use simulation to examine larger species trees. We found that it is easier to infer gene flow from a small population to a large one than in the opposite direction, and easier to infer inflow (gene flow from outgroup species to an ingroup species) than outflow (gene flow from an ingroup species to an outgroup species). If introgression is assumed to occur in the wrong direction, the time of introgression tends to be correctly estimated, Bayesian test of gene flow is often significant, and the estimated introgression probability may be even greater than the true rate. We discuss factors that cause gene flow to be asymmetrical, including geography, behavior, and incompatibility of introgressed alleles with the host genomic background. We analyze a dataset of Heliconius butterflies to demonstrate that typical genomic datasets are informative for inferring the direction of interspecific gene flow, as well as its timing and strength

    The Trouble with Sliding Windows and the Selective Pressure in BRCA1

    Get PDF
    Sliding-window analysis has widely been used to uncover synonymous (silent, dS) and nonsynonymous (replacement, dN) rate variation along the protein sequence and to detect regions of a protein under selective constraint (indicated by dN<dS) or positive selection (indicated by dN>dS). The approach compares two or more protein-coding genes and plots estimates dˆS and dˆN from each sliding window along the sequence. Here we demonstrate that the approach produces artifactual trends of synonymous and nonsynonymous rate variation, with greater variation in dˆS than in dˆN. Such trends are generated even if the true dS and dN are constant along the whole protein and different codons are evolving independently. Many published tests of negative and positive selection using sliding windows that we have examined appear to be invalid because they fail to correct for multiple testing. Instead, likelihood ratio tests provide a more rigorous framework for detecting signals of natural selection affecting protein evolution. We demonstrate that a previous finding that a particular region of the BRCA1 gene experienced a synonymous rate reduction driven by purifying selection is likely an artifact of the sliding window analysis. We evaluate various sliding-window analyses in molecular evolution, population genetics, and comparative genomics, and argue that the approach is not generally valid if it is not known a priori that a trend exists and if no correction for multiple testing is applied

    Dating Phylogenies with Sequentially Sampled Tips

    Get PDF
    We develop a Bayesian Markov chain Monte Carlo (MCMC) algorithm for estimating divergence times using sequentially sampled molecular sequences. This type of data is commonly collected during viral epidemics and is sometimes available from different species in ancient DNA studies. We derive the distribution of ages of nodes in the tree under a birth-death-sequential-sampling (BDSS) model and use it as the prior for divergence times in the dating analysis. We implement the prior in the MCMCtree program in the PAML package for divergence dating. The BDSS prior is very flexible and, with different parameters, can generate trees of very different shapes, suitable for examining the sensitivity of posterior time estimates. We apply the method to a data set of SIV/HIV-2 genes in comparison with a likelihood-based dating method, and to a data set of influenza H1 genes from different hosts in comparison with the Bayesian program BEAST. We examined the impact of tree topology on time estimates and suggest that multifurcating consensus trees should be avoided in dating analysis. We found posterior time estimates for old nodes to be sensitive to the priors on times and rates and suggest that previous Bayesian dating studies may have produced overconfident estimates. [Bayesian inference; MCMC; molecular clock dating; sampled tips; viral evolution.

    Estimation of Cross-Species Introgression Rates using Genomic Data Despite Model Unidentifiability

    Get PDF
    Full likelihood implementations of the multispecies coalescent with introgression (MSci) model treat genealogical fluctuations across the genome as a major source of information to infer the history of species divergence and gene flow using multilocus sequence data. However, MSci models are known to have unidentifiability issues, whereby different models or parameters make the same predictions about the data and cannot be distinguished by the data. Previous studies have focused on heuristic methods based on gene trees and do not make an efficient use of the information in the data. Here we study the unidentifiability of MSci models under the full likelihood methods. We characterize the unidentifiability of the bidirectional introgression (BDI) model, which assumes that gene flow occurs in both directions. We derive simple rules for arbitrary BDI models, which create unidentifiability of the label-switching type. In general, an MSci model with k BDI events has 2k unidentifiable modes or towers in the posterior, with each BDI event between sister species creating within-model parameter unidentifiability and each BDI event between non-sister species creating between-model unidentifiability. We develop novel algorithms for processing Markov chain Monte Carlo (MCMC) samples to remove label-switching problems and implement them in the BPP program. We analyze real and synthetic data to illustrate the utility of the BDI models and the new algorithms. We discuss the unidentifiability of heuristic methods and provide guidelines for the use of MSci models to infer gene flow using genomic data

    Duplicated Paralogous Genes Subject to Positive Selection in the Genome of Trypanosoma brucei

    Get PDF
    Background Whole genome studies have highlighted duplicated genes as important substrates for adaptive evolution. We have investigated adaptive evolution in this class of genes in the human parasite Trypanosoma brucei, as indicated by the ratio of non-synonymous (amino-acid changing) to synonymous (amino acid retaining) nucleotide substitution rates. Methodology/Principal Findings We have identified duplicated genes that are most rapidly evolving in this important human parasite. This is the first attempt to investigate adaptive evolution in this species at the codon level. We identify 109 genes within 23 clusters of paralogous gene expansions to be subject to positive selection. Conclusions/Significance Genes identified include surface antigens in both the mammalian and insect host life cycle stage suggesting that competitive interaction is not solely with the adaptive immune system of the mammalian host. Also surface transporters related to drug resistance and genes related to developmental progression are detected. We discuss how adaptive evolution of these genes may highlight lineage specific processes essential for parasite survival. We also discuss the implications of adaptive evolution of these targets for parasite biology and control
    • …
    corecore